In this portfolio I will examine how AI generated dance music compares to that of humans using a data set of both AI generated and human made tracks from the course computational musicology. The tracks are part of a collection (corpus) of music which is either composed by students of computational musicology, generated by AI or existing royalty free music. The features in the table below, just like their assigned values, were retrieved from essentia, an open-source C++ library for audio analysis and audio-based music information retrieval. All the tracks in the table have been analysed by this program which gave these results. The second table is a dataset which is filtered on if i considered the songs to be EDM, in order for me to copmpare only electorning dance tracks. Here is an explanation for what all the features mean:
Approachability reflects how pleasant and easy a song is to listen to,
Arousal measures its energy level, with higher values indicating more intensity.
Danceability assesses how well a track is suited for dancing, based on rhythm, beat strength, and tempo.
Tempo is a feature which indicates the speed of the song, measured in beats per minute (BPM).
Engagingness shows how likely a track is to hold the listener’s attention.
Instrumentalness estimates the presence of vocals, with higher values suggesting more instrumental content.
Valence describes the overall mood of the song, where higher values correspond to more positive and cheerful tones, while lower values indicate a more subdued or serious sound.
These features together provide a clear overview of each track’s musical profile, making it easier to analyze and compare songs.
| filename | approachability | arousal | danceability | engagingness | instrumentalness | tempo | valence | ai |
|---|---|---|---|---|---|---|---|---|
| ahram-j-1 | 0.2991498 | 3.417260 | 0.2711799 | 0.1026429 | 0.9141049 | 84 | 4.016967 | TRUE |
| ahram-j-2 | 0.1889460 | 4.459196 | 0.4690239 | 0.5624804 | 0.3271964 | 95 | 3.767471 | TRUE |
| aleksandra-b-1 | 0.1644350 | 5.343031 | 0.8357580 | 0.5665221 | 0.3702452 | 68 | 4.738314 | FALSE |
| aleksandra-b-2 | 0.2511401 | 3.680455 | 0.6918470 | 0.1301249 | 0.8842366 | 104 | 4.044941 | TRUE |
| angelo-w-1 | 0.1614367 | 3.621579 | 0.7069914 | 0.3248783 | 0.7907066 | 140 | 3.301473 | FALSE |
| id | approachability | arousal | danceability | engagingness | instrumentalness | tempo | valence | ai |
|---|---|---|---|---|---|---|---|---|
| berend-b-1 | 0.1450785 | 5.021568 | 0.7396224 | 0.5278043 | 0.5858963 | 143 | 4.429538 | TRUE |
| berend-b-2 | 0.2117881 | 5.656832 | 0.6107739 | 0.5786535 | 0.3487158 | 75 | 4.476577 | TRUE |
| desmond-l-1 | 0.2629817 | 4.478108 | 0.2859525 | 0.4156072 | 0.6434987 | 135 | 3.936315 | TRUE |
| desmond-l-2 | 0.2929443 | 5.076702 | 0.3010519 | 0.5524329 | 0.4989389 | 73 | 4.316221 | TRUE |
| evan-l-2 | 0.1081999 | 5.602334 | 0.4800247 | 0.6272448 | 0.5513844 | 135 | 4.445124 | TRUE |
Information on my submitted tracks
Hidde-s-1:
I produced this song myself. I make music with clubs or festivals in mind as I like to DJ. For this track I tried to combine a mainstream house music sound and combine it with some more raw electronic sounds.
Hidde-s-2:
This is a track I generated with Suno. I asked chat gpt what the key characteristics of a dance track in a sweaty club in Amsterdam were: “Punchy four-on-the-floor kick, deep rolling bass, crisp shuffled hi-hats, sharp claps, detuned wide synth leads, tension-filled breakdown, rising FX, massive sidechained drop, high-energy, club-focused groove.”
My tracks in the class corpus
This is a graph which has mapped the engagingness of each song compared to its danceability. The colour scale is based on the tempo of each song. The first noticable aspect of the graph is the seemingly positive correlation between danceability and engagingness which is shown by the red trend line. On average it is clear that in most cases a high danceability value means that same song will have a high engagingness rating aswell. From the colour scaling it can also be noticed that most songs that have high scores for those features also have a higher tempo. This could mean that those features are highly correlated or that the way essentia measured these features is similar in terms of computational analysis. It would be interesting to look at why this correlation seems to be in place, for instance through examining the roll of instrumentallness, or genre in combination with this analysis.
The two points that are highlighted are a song I arranged by myself and one I generated with suno. What can be seen with these songs is that my own song performs higher in both danceability and engagingness than the AI song while they are the same genre and made with the same intention. We can’t conclude alot yet just from this example, however it lead us to the hypothesis: Ai generated music is distinguishable from human made music. Which we are going to evaluate in the next tabs.